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Abstract — We describe the structure of optimal Input co- 
variance matrices for single user multiple-input/multiple-output 
(MIMO) communication system with covariance feedback and 
for general correlated fading. Our approach is based on the novel 
concept of right commutant and recovers previously derived 
results for the Kronecker product models. Conditions are derived 
which allow a significant simplification of the optimization 
problem. 

I. Introduction 

Since the seminal work of Telatar [1] on the Shannon 
capacity of multi-antenna wireless systems, this area has 
attracted a lot of attention. The deveploment has started with 
the investigation of the capacity of single-user MIMO systems. 
Many results on the capacity for different types of channel 
state information at the transmitter and/or receiver are known. 
The achieved progress in this field was the key element, that 
MIMO systems are already used in existing systems. One 
important research topic on MIMO systems is the impact of 
correlation of the channel matrix on the achievable capacity 
[2]-[8]. A lot of results are known in this area, but most of the 
works are using the assumption, that the channel covariance 
matrix is the Kronecker product of the covariance matrices of 
the transmit and receive antennas [3], [4]. In the following 
paper the general case is analyzed. 

The paper is organized as follows: In Section II we review 
shortly the model and formulate the main problem. Addition- 
ally we divide the set of variance mattices into two classes 
of separable and entangled positive semidefinite matrices, a 
definition borrowed from quantum information theory. This 
separation shall help us to present our results for the class of 
the separable matrices which is easier to deal with, followed 
by an extension of results to entangled matrices. Section 
III starts with a novel concept of right commutant which 
is the key ingredient in our approach. It can be seen as a 
characterization of one-sided invariant subspaces for the given 
channel variance matrix (cf. Lemma 13 . 1 1 or, alternatively, 
as description of symmetries of the channel variance matrix 
(cf. Lemma 1331 1). Our subsequent results in Section III rely 
hardly on that concept, which, combined in a appropriate 



way with some simple concavity considerations , turns out 
to be rather powerful tool. For example, we do not need 
any majorization results/considerations which are the basis 
of results in [7], [8]. Our main result, Theorem 13.31 is a 
characterization of optimal input variance mattices. 
Notation and Preliminaries We shall denote matrices by 
capital letters, e.g. H. The hermitian conjugate (adjoint) is 
denoted by (-) H while (•)* is reserved for the transpose of a 
matrix. The set of N x N matrices with complex entries is 
abbreviated by M(A r , C) and A<S> B denotes the tensor product 
(Kronecker product) of matrices A and B. In is the N x N 
unity matrix. diag(Qi, . . . , Q c ) is the shorthand for the matrix 
which has the matrices Qi, . ■ ■ ,Q C as its diagonal entries and 
Os else, the size of the diagonal blocks will be specified in 
each particular case. tr(A) is the trace of the matrix A and 
H ~ A/"(0, E) means that the complex valued random matrix 
H of prescribed size is normally distributed with mean and 
variance S. 

We shall introduce some simple concepts from the theory of *- 
algebras of matrices which will be helpful in this paper (cf. [9] 
chap. I for more information). A ^-algebra A in M(N, C) is 
a linear subspace which is closed under matrix multiplication 
and under the action of ( ) H -operation. It can be shown [9] that 
each *-algebra of mattices has a multiplicative unit, ^-algebras 
appearing in this paper shall have l^v as the unit element with 
respect to the mattix multiplication. A (orthogonal) projection 
F ^ is called minimal projection in A if P G A and 
Q < P for any projection Q G A implies Q = or P = Q. 
Equivalently, a non-zero projection P G A is minimal if and 
only if PAP = CP. By a resolution of identity in A we 
mean a set of mutually orthogonal projections {Pi] c i=1 C A 
that satisfies Pi — 1, where 1 denotes the multiplicative 

unit in A. 

If A € A C M.(N, C) is hermitian or normal matrix, then 
we can represent it according to the spectral theorem as 
A = J2\ea(A) ^P\> where cr(A) denotes the spectrum (set 

'After finishing this paper we learned that Tulino, Lozano and Verdu 
[14] used the concavity of the capacity in a similar way to characterize 
optimal covariances for channels with independent columns and symmetric 
joint distribution. 



of eigenvalues) and Pa is the projection onto the eigenspace 
corresponding to the eigenvalue A. By defining properties 
of a *-algebra, with A G A we also have g (A) G A for 
each complex valued polynomial. It is easily seen that for 
each A G cr(A) there is complex valued polynomial g\ with 
g\(A) = P\ and hence Pa G A for all A G cr(A), a fact which 
will be useful in the proof of Lemma 13.11 below. 
Finally, we recall a way of viewing a tensor product of 
matrices as a linear map which will be necessary in the last part 
of the paper: For A G M(M, C), B G M(JV, C) we consider 
the tensor product A ® B and an M x JV matrix H. Then it is 
easily seen using rank one M x N matrices that the canonical 
action of A®B on H is given by (A(8)B)(if) = ARB 1 . This 
action extends to arbitrary elements of M(M, C) ® M(JV, C) 
by linearity, since each E G M(M, C) ® M(AT, C) can be 
written as a complex linear combination of such elementary 
tensors A ® B. 

II. Model and Problem Formulation 

We focus on a single point-to-point wireless communication 
system using N transmit and M receive antennas. We assume, 
that the behavior of the channel can be described by the well 
known narrow-band flat fading channel model, i.e. 

y = Hx + n, 

where x is the N dimensional transmit vector, y is the M 
dimensional receive vector, H is the M x N channel matrix, 
and the M components rik of the noise vector n are assumed 
to be i.i.d. complex circularly symmetric Gaussian distributed 
with mean and variance a\. For the channel matrix H we 
will use a more general correlation model than [7], [8] to 
present our ideas in the most transparent way which allows 
a direct comparison with the existing results. Then we shall 
show that this correlation model already incloses the full 
complexity of the general case. The channel matrix in this 
special case can be described as follows: 

s 

H = Y / Rfw i T^, (1) 

i=l 

where Wi are i.i.d. zero mean , mutually independent complex 
Gaussian MxN matrices and the positive semidefinite M x M 
resp. N x N matrices B4 resp. Tj are related to the variance 
E of H by 

S 

E = ^P i ®T i , (2) 
i=i 

where E := £ (if <E> H) which has components £(Hi,jHi, m ). 
Observe that, since we are dealing with complex matrices, 
A > implies that A is hermitian. 

Remark: Note that such decompositions into a sum of ten- 
sor products of positive semidefinite (PSD) matrices are, 
in general, non-unique: a simple example is given in the 
symmetric case of two transmit and two receive antennas with 
the variance matrix E = 1 <E> 1 which can be alternatively 
decomposed into E = 2~2i=i 1 ® e i e ?' { e i 1 e 2 } being any 



orthonormal basis in C 2 . This non-uniqueness with respect to 
decompositions corresponds to the freedom of choice in the 
particular realization of random variables distributed according 
to a given probability distribution. 

PSD matrices acting on C M ® that allow a decomposition 
as in Q with PSD summands are called separable in quantum 
information theory. Otherwise we say that they are entangled 
(cf. [11], [10] and references therein). The simplest example 
of an entangled PSD matrix is given by gg H , where g := 
&\ ® &\ + e2 (8> e2 and {ei, 62} being canonical basis of C 2 . 
A handy sufficient criterion for separability of a given PSD 
matrix over C M is given in [10]: 

Theorem 2.1 (Gurvits/Barnum): A PSD matrix E is sepa- 
rable if 1 1 E — 1m ® ljv 1 12 _• 1> where | • 1 12 denotes the Hilbert- 
Schmidt norm on matrices (i.e. \\A\\2 := y (A, A)hs '■= 
y/n(A«A)). 

In the following paper we assume, that the receiver knows 
the channel perfectly, and the transmitter has only knowledge 
of the channel covariance matrix E. As a consequence, the 
channel state information at the transmitter is a deterministic 
function of the channel state information at the receiver. Under 
this condition the ergodic capacity of the considered MIMO 
system is given by 

C= max £(logdet(l M + \hQH h )), (3) 

tr(Q)<p 
Q>0 

as it is easily seen using the results of [12]. The optimization 
problem Q is a convex smooth optimization problem. The 
capacity C = C(Q) for an optimal transmit covariance matrix 
Q is achieved by transmitting independent complex circular 
Gaussian symbols along the eigenvectors of Q, and the powers 
are allocated according to the eigenvalues of the matrix Q [5]- 
[8]. 

III. Results 

For a given variance matrix E G M(M, C) <g> M(N, C) we 
define the "right" commutant 

C s := {A e M(iV,C)|(lM<8>A)E = E(1 M (8 A)}, 

and consider any resolution of unity consisting of mutually 
orthogonal minimal projections in Cs, i.e. 1^ = 2~2i=i Pi 
with Pi G Cs minimal and PiPj — ^P,-. 
Example 1. If the variance matrix is given by E = R<E>T then 
we have C s = {A G M(N,C)\AT = TA}, and each set of 
mutually orthogonal minimal projections in Cs adding to ljv 
is given by projections onto the one-dimensional subspaces 
spanned by the eigenvectors of T. 

Some simple observations concerning the concept of right 
commutant are collected for ease in the following 

Lemma 3.1: Let E be a PSD matrix in M(M, C)®M(7V, C) 
then we have: 

1. Cs is a subalgebra of M.(N, C) containing ljv which is 
closed under (-) H — operation, i.e. Cs is a *-algebra. 

2. Let {Pi}" =1 and {Qj}| =1 be resolutions of identity 
consisting of minimal projections in Cs . Then u = v 



and there is a permutation it of {l,...,u} such that 
tr(Pi) = tr(Q 7r(4) ) for aU % e {1, . . . , u}. 
3. If E is separable and if {Pj}" =1 is a resolution of identity 
consisting of minimal projections in Ce, then there is a 
decomposition of E into sum of tensor products of PSD 
matrices 



5> 



satisfying TjP,- = P/Tj for all i E {1, . . . , s} and j e 
{!,...,«}. 

Remark: Our right commutant Cs is a close relative of the 
concept of commutant which is widely used in the theory 
of operator algebras and quantum information theory. And, 
indeed, the proof of the properties stated in Lemma 13.11 
consist of some standard conclusions, at least for those already 
familiar with the usual commutant from the theory of operator 
algebras. For the ease of reading we include this short proof. 
Proof of Lemma \3.1\ The first item is easily checked by 
inspection and is standard in the theory of matrix (operator) 
algebras (cf. [9]). For the second item, note that each PiQjPi 
is hermitian and contained in Cs- It is well known that then 
all spectral projections of PiQjPi are also contained in Ce. 
Using this fact it is easy to deduce a contradiction to the 
assumed minimality of the involved projections unless u = v. 
The second part is then easily obtained. 
The third item follows from the relation 

u 

E = ^(l M ®Pj)E(l M ®Pj), 
i=i 

combined with E = Ya=i r i ® T l where ^ md T l ^ PSD ' 
which is ensured by separability of E. Indeed, we merely have 
to set 



Ri := R; and T, := 



, PjTiPj, 



and we arrive at the desired conclusion of the lemma. □ 
Remark: As we will show in the following the minimal 
projections {P*}" =1 shall serve as the starting point of 
block-diagonalization procedure for optimal input covariance 
matrices. The second part of Lemma 13.11 ensures that no 
particularly chosen minimal resolution of identity is preferred, 
i.e. the dimensions of the corresponding ranges of considered 
projections are equal up to a permutation. 

Unfortunately, there are cases where the algebra Cs is triv- 
ial, i.e. consists of complex multiples of 1^ as the following 
example shows: 

Example 2. Let M — 2 = N and £ = eief <g> eief + e 2 e^ <g> 
gg H , where {ei,e2} denotes the canonical basis in C 2 and 
g = -^(ei + e-x). Let P € Cs be a projection, then we have 
(1m ® P)E = £(1m <8> P)- Inserting this into the expression 
for £ above and multiplying with aef <S> In for i = 1,2 we 
end up with two equations eiefP = Pexef and gg H P = 
Pgg H ■ A simple calculation shows that P = aljv with 
a G M + and hence P = 0orP = lAr. 

In the following we separate our presentation in two parts; in 



the first we consider the separable variance matrices while in 
the second no restrictions on channel matrices H are assumed. 
This separation, although not necessary from the viewpoint 
of mathematics, has the advantage that we can first present 
our ideas in a situation which is close in the spirit to the 
previous work of Jafar/Wishwanath/Goldsmith [6], [7] and 
Jorswieck/Boche [8], and then we show that the result extends 
immediately to the general case. 

A. Optimal Input Covariance Matrices: Separable Case 

Now, we can describe the optimal input matrix in the case 
where E is separable and Cs contains non-trivial minimal 
projections, i.e. not equal ljy- 

Choose any resolution of identity consisting of minimal mutu- 
ally orthogonal projections C s (the transpose of Cs) , denoted 
by {Pj}j—i, and a decomposition of E with properties given 
in Lemma 13.11 3 with respect to {Pj}j =1 , a resolution of 
identity consisting of minimal projections in Cs- Then there 
is a unitary U such that T\ = f7diag(Ti(l), ...,T t (c))U H for 
all i € {1, . . . , s}, where the matrices Ti(j) map the range of 
Pj into itself, i.e. each T\ is block-diagonal in the basis given 
by the unitary matrix U. 

Theorem 3.2: Suppose that the variance matrix E of H ~ 
jV(0, E) is separable and that Cs ^ C • In- Then the capacity 
achieving covariance matrix Q can be chosen such that 

g = [/diag(Q 1 ,...,Q c )f/ ff , 

where each Qj maps the range of Pj into itself, j £ {1, . . . , c}. 
Proof: Suppose that we are given any capacity achieving 
covariance matrix Q, i.e. 

C = C(Q) = £ (^logdet (l M + ^^- H 

Due to our system assumption, the last expression is written 
as 



C = £(logdet(l 



M- 



El=i Rf w 1 t 1 5 Q J2Ui T i 5 W Rf 



)). 



Now, we insert the relation 

TP = Pdiag(r,(l), . . . , T^U" =: U%V H , 
with Q := U H QU fulfilling tr(Q) = tr(Q) and arrive at 

2~2ti = iR!W i f}Qffw i H Rf 



c 



£(logdet(l 



)) 



= : C(Q) (4) 

where we have used that the random matrices Wi and WJJ 
have the same probability distribution since each W% is i.i.d. 
Gaussian and the Wi's are jointly independent. The trans- 
formed matrix Q can be written as a block matrix with respect 
to the transformation U induced by the set {Pj}j =1 of minimal 
projections in C^: 

/ Qu Q12 ■ ■ ■ Qic \ 
Q21 Q22 ■ ■ ■ Q2c 



Q 



\ Qci Qc 



■ Qcc J 



We consider the unitary and hermitian matrix 

Ui := diag(l Pl , -lp 2 , -lp 3 , . . . , -lpj, 

where lp. denotes the matrix acting as the identity on the 
range of Pj. Then we have U\Til]\ — Ti, 

( Qn ...0 \ 



Qi ■■= -(Q + mQm) = 



Q 2 



V o 



/ 



and tr(Q) = tr(Qi). 

Due to the concavity of the functional C defined by the last 
eqn. in @ we end up with 



C > C(Qi)>i(7(Q) 
= C, 



(5) 



where we have used U\TiU\ = fj in the first equality. In 
the next step we consider the unitary and hermitian matrix Ui 
given by 

U 2 := diag(l Pl , lp 2 , -lp 3 , . . . , -lpj, 

and can define in a similar way a matrix Q2 := \{Qi + 
U2Q1U2) and show analogously that (7((32) = C holds. 
Continuing this procedure we arrive at the claimed conclusion 
of the theorem. □ 

Note that, as mentioned previously, in the case £ = R eg) T 
the resolution of identity {Pj}\ consists of one-dimensional 
projections, i.e. c = N and we recover the results of [7], 
[8] that the optimal transmission strategy consists of sending 
independent circularly symmetric gaussian inputs along the 
eigenvectors of T. 

B. Optimal Input Covariance Matrices: General Case 

If we examine carefully our construction in the proof of 
theorem l3~2l we see that we have needed only the concavity of 
the capacity functional together with the fact that UjTiUj = Ti 
which means that applying Uj does not change the probability 
distribution of the considered random matrix H. Hence, in 
order to extend our proof to the case of general random 
matrices H ~ AA(0, E) we merely have to consider the 
basis-free versions of hermitian and unitary matrices Uj = 



2(P 1 + ... + P j )-l N ,j = l,. 



. c which realize our block- 



diagonalization. Taking into account the first part of Lemma 
13.51 below, that contains the description of the symmetries of 
the channel at our disposal, we conclude that Theorem 13.21 
extends mutatis mutandis to the general situation. The only 
change is that we drop the condition of separability we have 
supposed in the statement of Theorem 13.21 

Theorem 3.3: Let H ~ W(0, £) be a random M x N 
channel matrix and suppose that Cs 7^ CI at. Then the capacity 
achieving covariance matrix Q can be chosen such that 

Q = Udmg(Q u ...,Q c )U H , 

where Qj maps the range of Pj into itself, {Pj}j =1 denotes 
any resolution of identity consisting of minimal projections 



in C|; and U is any unitary matrix which diagonalizes all Pj 
simultaneously. 

We now use Theorem 13.31 for a further analysis of our 
optimization problem. We use the structure 



ciH 



Q = [U\...,U c ]dmg(Q 1 ,...,Q c )[U\...,U 

of the optimal transmit covariance matrix Q. The block Qi has 
the dimension U x k and the corresponding unitary matrix U l 

c 

has the size M x Zj. We have Yl h = N. If we use the matrix 

i=l 

Hi = HU l , then we have for the optimal transmit covariance 
matrix 

1 c 

C = I(Q) = S (logdet(l M + — Y^HiQi 11 "))- 

° n 1=1 

Thus the optimal block matrix diag(Qi, . . . , Q c ) can be cal- 
culated as the solution of 



max £ (log det ( 1 m 



!>0 

E tr (Q,)<p 



-y 



HiQiHf 1 )). 



As a consequence of this simple observation and Theorem 
13.31 we achieve the following corollary. 

Corollary 3.4: The block matrix diag(Q 1 , . . . ,Q C ) is the 
optimal block matrix if and only if, there exists a /1 > and 
positive semidefinite matrices ^1, . . . , such that Q\ > 
0, 1 < k < c, 



r£(tr(if fe H (l 



M 



E 

1=1 



HiQiH^Hk) = pl h -* fe , 



0, 1 < k < c, 



and 



holds. 



E 

1=1 



ti{Qi)=p 



Remark: For the classical correlation scenario = R ® 
T we have again c = N, 1% = ... = In = 1, and Q = 
diag(pi, . . . ,pn),Pi > 0, where the pi are the solution of the 
well known power optimization problem [7], [8]. 
The following Lemma 1331 gives a further description of the 
optimal transmit covariance matrices. 

Lemma 3.5: Consider any M x N random channel matrix 
H ~ jV(0, E) and let U be a unitary N x N matrix. Then: 

1 . The channel matrices H and HU have equal probability 
density functions iff {/* G Cs, or equivalently U G C^. 

2. If Q^ 1 ) and are capacity achieving PSD matrices, 
i.e. C(QW) = C(QW), with tr(Q«) = p = tr(Q( 2 ') 
then 



HQ^H H = HQ^H H a.s., 



(6) 



with respect to the law of H. 
Proof: 1 . The first statement is easily obtained by using change 
of variables. For reader's convenience we give some crucial 



steps: First, the variances S of H resp. £[/ of H[/ are related 
by = (l M ®U tH )Y>(l M ®U l ). This can be easily verified 
using change of variables formula and observing that each 
tensor product A ® B G M(M, C) ® M(N, C) canonically 
induces a linear map on M x N matrices by assignment 
H i ► ABB 1 . Note that the probability density function of 
the channel matrix can be written as 

/(H) ^e-*^" 1 ^**, 

where (-,-)hs denotes the Hilbert-Schmidt inner product and 
K is the normalization constant. The conclusion of the first 
part of the lemma is now obvious. 

2. According to our assumption and due to the concavity of 
the capacity functional we may conclude that 

c = c{\qv + \q^) = \C{QV) + \C{QW). 

Moreover, since the functional logdet(-) is concave we see 
that for Q = \{Q {1) + Q {2) ) 



, , / HQH H \ 1 / HQ^H H 

logdet 1 M H — = - logdet 1 M + 



These two relations lead immediately to 



+ i logdet ( 1 u 



HQ^H H 



holds almost surely with respect to the probability distribution 
of the channel matrix H. This last equation, in turn, is 
equivalent to 



det 1 



M 



HQH 



H 



= det 1 



x det 1 



HQ^H H 



hqWh h y 



(7) 



almost surely. Now, recall the Minkowski's determinant in- 
equality and the log-concavity of the determinant (cf. [13]) 
which can be stated as the following chain of inequalities: 

det(AA + (1 - X)B) > (Adct(A)^ 

+ (1 - A)det(B)*) M 

> det(A) A det(B) 1 - A , (8) 

for A G (0,1) and A,Be M(M,C) positive definite. The 
equality appears in the first inequality iff A — aB with a > 0, 
while the equality in the second line is obtained iff det (A) = 
det(B). Hence the overall equality in (|8} can appear iff A = 
B. Translating this to our eqn. {7) we see that 



1m + — = a(H) 1 M 



a.s. with a measurable function a which is almost surely 
positive and 

HQ^H H \ , / HQ^H R - 

act | 1 M H = det 1m H 5 I a.s. 



HQ^H H 



IM 



a.s. 



□ 



Remark: As the proof shows, the second part of our Lemma 
13.51 gives us also a necessary and sufficient condition for 
equality in the concavity of the capacity functional. 

IV. Conclusion 

We have described the structure of optimal input covariance 
matrices using the symmetries of the channel matrix H at 
our disposal. Those symmetries are encoded in the right com- 
mutant Cs. If Cs ^ Cljv the original optimization problem 
reduces to independent optimization problems coupled only 
over the trace constraint of Corollary 13.41 
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